Phishing attacks are a significant cybersecurity threat as they trick people into revealing personal information through fake websites. This project introduces an integrated CNN-LSTM model to detect phishing URLs. It uses Convolutional Neural Networks (CNNs) to look for local patterns and Long Short-Term Memory (LSTM) networks to analyze the order of information in URLs. To further clarify, SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) are implemented, giving insights into how the model predicts. The trained model is served as a FastAPI/Flask web service, enabling real-time URL analysis. A browser extension is also created to communicate with the API, facilitating on-the-fly phishing detection as users surf the web. The system offers predictions as well as explanations, enhancing user trust and security awareness. By integrating deep learning, explainability, and web deployment, this project provides a real-world and scalable cybersecurity solution, with potential for further improvements using Graph Neural Networks (GNNs), Reinforcement Learning, or Meta-Learning., or Math in Paper Title or Abstract.
Introduction
With the rise of internet use, phishing attacks—where fake websites steal sensitive information—have increased. Traditional rule-based detection methods struggle to keep up with evolving phishing tactics. To address this, the project proposes a hybrid deep learning model combining CNNs (for extracting local URL patterns) and LSTMs (for capturing sequential dependencies), improving accuracy and robustness against phishing URLs.
Explainability is emphasized by integrating SHAP and LIME techniques, allowing users to understand why a URL is flagged, enhancing trust in the system. The model is deployed via a FastAPI/Flask web service for real-time URL classification and is integrated into a browser extension that provides users instant phishing warnings.
Existing systems mainly rely on blacklists, heuristics, or traditional machine learning models that require manual feature engineering and often lack real-time detection or explainability. The proposed CNN-LSTM model overcomes these limitations by automatically learning complex URL features and offering transparent, interpretable predictions.
The system demonstrates high accuracy, low latency, and practical real-time usability. Its combination of deep learning performance, explainability, and browser integration makes it a more effective and user-friendly solution for phishing prevention compared to conventional methods.
Conclusion
The proposed phishing detection system effectively combines deep learning, explainability, and real-time deployment to provide a comprehensive security solution. The hybrid CNN-LSTM model ensures high accuracy by capturing both local and sequential features of URLs. With SHAP and LIME, the system is more trustworthy for end users .Fast API integration and a browser extension enable smooth real-time detection and user alerts. Compared to traditional methods, this system is more adaptive, user-friendly, and robust against modern phishing threats. It lays a strong foundation for future enhancements using advanced AI techniques.
References
[1] Altwaijry, Najwa, Isra Al-Turaiki, Reem Alotaibi, and Fatimah Alakeel. 2024.
[2] T. Niu and B. Wu, \"Visual-based Phishing Website Recognition,\" 2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 2024, pp. 992-997, doi: 10.1109/IMCEC59810.2024.10575293.
[3] S. Asiri, Y . Xiao,S . Alzahrani, S . Li” A survey of Intelligent Detection Designs of HTML URL Phishing Attacks,” in IEEE Access, vol 11, pp. 6421-6443, 2023
[4] J. V. Jawade and S. N. Ghosh, \"Phishing Website Detection Using Fast.ai library,\" 2021 International Conference on Communication information and Computing Technology (ICCICT), Mumbai, India, 2021, pp. 1-5.
[5] F. Tajaddodianfar, J. W. Stokes and A. Gururajan, \"Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection,\" ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 2857-2861.